Lesson 4. Information Encoding Systems

Lesson Objective

Be able to convert letters into binary using ASCII.
Understand the purpose of Unicode.
Define the terms bit, nibble, byte, kilobyte, megabyte, gigabyte and terabyte using standard prefixes.

Lesson Notes

Bit Patterns

Bit patterns play a crucial role in representing various types of data. Whether it's text, images, sound, or integers, everything is ultimately translated into binary form-combinations of 1s and 0s. Let's explore how different data types are converted into these bit patterns:

Text (Characters):

When you press a key on your keyboard, it needs to be transformed into a binary number so that the computer can process it and display the corresponding character on the screen.

The ASCII code (American Standard Code for Information Interchange) assigns a unique binary number to each character. For instance:

The letter 'a' corresponds to the binary number 0110 0001 (decimal 97).
The letter 'b' corresponds to 0110 0010 (decimal 98).
The letter 'c' corresponds to 0110 0011 (decimal 99).

ASCII code covers special characters, punctuation, return keys, control characters, as well as uppercase and lowercase letters. It can represent 128 characters, which suffices for most English words but falls short for other languages.

Images:

Images are typically represented as bitmap graphics. Each image consists of tiny squares called pixels.

The color of each pixel is encoded using binary values. For example:

Black pixel: 00000000 (no color)
White pixel: 11111111 (full color)

Additionally, the color depth determines the range of colors an image can display. Higher color depth allows more vibrant and detailed images.

Sound:

Sound is captured through sampling. Analog sound waves are converted into digital samples.

The bit depth (number of bits per sample) affects sound quality. More bits provide greater accuracy.

Common audio formats (like MP3) use bit patterns to represent sound data.

Here some practical examples of file sizes:

One Character File: A single character (e.g., 'a') occupies 1 byte.
Full Page of Text: Approximately 30 kilobytes (kB). That's the size of a typical text document.
DVD: Approximately 4.5 gigabytes (GB). DVDs store movies, software, or large data files
Hard Disk: A whopping 1 terabyte (TB). Modern hard drives offer massive storage capacity for files, applications, and more.
Small Digital Color Photograph: Roughly 3 megabytes (MB). This size accommodates a decent-quality photo.
Music CD: About 600 MB. A standard audio CD holds around 74 minutes of music.

Storing Values in Binary

With only 2 values (0,1), we need to understand how many values we can make with n bits.

“There are 2n possible variations with n bits”

If we had 2 bits we can arrange them in 2² or 4 different ways.

00, 01, 10, 11

Bits	2n	Combinations
1	2¹	2
2	2²	4
3	2³	8
4	2⁴	16
5	2⁵	32
6	2⁶	64
7	2⁷	128
8	2⁸	256
...	2^...
16	2¹⁶	65,536
32	2³²	4,294,967,296
64	2⁶⁴	18,446,744,073,709,551,616

Storage Prefix

Computers process and store large amounts of bytes, often in the order millions or billions.

When dealing with large quantities it is more convenient to summarise this using number prefixes.

A common example of this is the kilogram (kg) which is equivalent to 1000 grams (g).

When describing quantities of bytes we use either: Binary prefixes (powers of 2) or decimal prefixes (powers of 10).

Base 2 Binary (10, 20, 30 - ?)
Unit	2n	Value
kibibyte (KiB)	2¹⁰	1,024
mebibyte (MiB)	2²⁰	1,048,576
gibibyte (GiB)	2³⁰	1,073,741,824
tebibyte (TiB)	2⁴⁰	1,099,511,627,776
pebibyte (PiB)	2⁵⁰	1,125,899,906,842,620
exbibyte (EiB)	2⁶⁰	1,152,921,504,606,846,976

Base 10 Decimal (3, 6, 9 - x1000) - Metric
Unit	2n	Value
kilobyte (kB)	10³	1,000
megabyte (MB)	10⁶	1,000,000
gigabyte (GB)	10⁹	1,000,000,000
terabyte (TB)	10¹²	1,000,000,000,000
petabyte (PB)	10¹⁵	1,000,000,000,000,000
exabyte (EB)	10¹⁸	1,000,000,000,000,000,000

The same number prefixes for decimal values can be used to summarise large quantities of bytes.

Common prefixes include:

kilobyte (kB) = 10³ = 1,000
megabyte (MB) = 10⁶ = 1,000,000
gigabyte (GB) = 10⁹ = 1,000,000,000
terabyte (TB) = 10¹² = 1,000,000,000,000

Traditionally computer scientists used these same number prefixes to refer to groups of bytes.

These are not the same as their decimal equivalents.

EXAMPLE???

To eliminate this confusion, in 1998 the International Electrotechnical Commission (IEC) established different prefixes to represent multiples of base 2:

kibibyte (KiB) = 2¹⁰ = 1,024
mebibyte (MiB) = 2²⁰ = 1,048,576
gibibyte (GiB) = 2³⁰ = 1,073,741,824
tebibyte (TiB) = 2⁴⁰ = 1,099,511,627,776

Storage devices used binary base 2 numbers, so binary prefix is more accurate.

Units of Measurement

In the metric system, we have straightforward conversions: 1 kilometer (km) equals 1000 meters (m), and 1 kilogram (kg) equals 1000 grams (g). These powers of 10 make sense and are easy to remember.

However, when it comes to digital storage, things get interesting. Computers use binary (base 2) for everything. So, when they measure storage, they work in powers of 2. A kilobyte (kB) is commonly understood to be 1000 bytes. However, there's a historical twist.

In binary terms, a kilobyte is actually 1024 bytes (2¹⁰). This discrepancy arises because 1024 is the nearest power of 2 that's close to 1000. To address this, modern terminology introduces the concept of a kibibyte (KiB). A kibibyte is precisely 1024 bytes, matching the binary reality. Meanwhile, a kilobyte (kB) remains 1000 bytes (as per the metric system).

Binary patterns can be used to represent any data, be it text, sound, images or video. The number of possible bit combinations or patterns available increase with when the numbers of bits increase.

Unit Conversion Table

Use this table to help you convert units. Its not a great table, im working on it...

Div	Unit	Mult
0	Bit	0
0	byte	5
3	kilobyte	2
0	megabytes	0
0	gigabytes	5
3	terabyte	2
3	petabyte	2

Lesson Notes

7 bit ASCII Table

ASCII (pronounced "az-kee" or "ass-key" if American) stands for the American Standard Code for Information Interchange. It serves as a character encoding standard used for electronic communication between computers, telecommunications equipment, and other devices. Here are some key points about ASCII:

Character Encoding: ASCII assigns standard numeric values to letters, numerals, punctuation marks, and other characters commonly used in computers. Each character is represented by a unique numerical code.
128 Values: Initially, ASCII had only 128 code values, of which only 95 are printable characters. These include digits (0 to 9), lowercase letters (a to z), uppercase letters (A to Z), and punctuation symbols. The remaining 33 codes were non-printing control characters, such as carriage return and line feed.
Binary Representation: ASCII encodes characters into seven-bit integers. For instance, the lowercase letter "i" is represented by binary 1101001 (hexadecimal 69 or decimal 105).
Evolution and Scope: While modern computer systems have transitioned to Unicode (which has millions of code points), the first 128 Unicode code points align with the original ASCII set. ASCII remains a fundamental foundation for character encoding in computing.

Despite being an American standard, ASCII does not include a code point for the cent symbol (¢) or support English terms with diacritical marks (such as résumé and jalapeño) or proper nouns with diacritical marks (such as Beyoncé).

NOTE: Binary values in the table are incorrect. Will fix it later when I have some time.

Binary	Dec	Hex	Char	Binary	Dec	Hex	Char	Binary	Dec	Hex	Char
0100000	32	20		1000001	64	40	@	1100001	96	60	`
0100001	33	21	!	1000010	65	41	A	1100010	97	61	a
0100010	34	22	"	1000011	66	42	B	1100011	98	62	b
0100011	35	23	#	1000100	67	43	C	1100100	99	63	c
0100100	36	24	$	1000101	68	44	D	1100101	100	64	d
0100101	37	25	%	1000110	69	45	E	1100110	101	65	e
0100110	38	26	&	1000111	70	46	F	1100111	102	66	f
0100111	39	27	'	1001000	71	47	G	1101000	103	67	g
0101000	40	28	(	1001001	72	48	H	1101001	104	68	h
0101001	41	29	)	1001010	73	49	I	1101010	105	69	i
0101010	42	2A	*	1001011	74	4A	J	1101011	106	6A	j
0101011	43	2B	+	1001100	75	4B	K	1101100	107	6B	k
0101100	44	2C	,	1001101	76	4C	L	1101101	108	6C	l
0101101	45	2D	-	1001110	77	4D	M	1101110	109	6D	m
0101110	46	2E	.	1001111	78	4E	N	1101111	110	6E	n
0101111	47	2F	/	1010000	79	4F	O	1110000	111	6F	o
0110000	48	30	0	1010001	80	50	P	1110001	112	70	p
0110001	49	31	1	1010010	81	51	Q	1110010	113	71	q
0110010	50	32	2	1010011	82	52	R	1110011	114	72	r
0110011	51	33	3	1010100	83	53	S	1110100	115	73	s
0110100	52	34	4	1010101	84	54	T	1110101	116	74	t
0110101	53	35	5	1010110	85	55	U	1110110	117	75	u
0110110	54	36	6	1010111	86	56	V	1110111	118	76	v
0110111	55	37	7	1011000	87	57	W	1111000	119	77	w
0111000	56	38	8	1011001	88	58	X	1111001	120	78	x
0111001	57	39	9	1011010	89	59	Y	1111010	121	79	y
0111010	58	3A	:	1011011	90	5A	Z	1111011	122	7A	z
0111100	59	3B	;	1011100	91	5B	[	1111100	123	7B	{
0111101	60	3C	<	1011101	92	5C	\	1111101	124	7C	\|
0111110	61	3D	=	1011110	93	5D	]	1111110	125	7D	}
0111111	62	3E	>	1011111	94	5E	^	1111111	126	7E	~
1000000	63	3F	?	1100000	95	5F	_	1111111	127	7F	DEL

8 bit ASCII

8-bit ASCII, also known as Extended ASCII, builds upon the original American Standard Code for Information Interchange (ASCII) system. To enhance its foundational capabilities, 8-bit ASCII includes 8 binary digits (or bits) for each character.

ASCII represents characters using 7 bits (128 code points). However, 8-bit ASCII extends this to 256 characters by utilizing 8 bits per character.
The additional bit allows for a broader range of characters, including special symbols, accented letters, and other language-specific characters.

In summary, 8-bit ASCII enhances the original character encoding by allowing more characters and symbols, making it versatile for different contexts.

A Spooky Ghost

        _,.--.
      .'      `-.
     /   O O   \
    |          /
    |         /
    |        /
     \      /
      `.__.'

A Cat

        /\_/\
       ( o.o )
      > ^ <
     /  ---  \
    /         \
   /           \

????

        /\
        /  \
       / o o \
      /   ^   \
     /         \
    /_/-\___\_\

An Apple

        ,--./,-.
        / #      \
       |          |
        \        / 
         `._,._,'

Unicode

Unicode, formally known as The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium. Its purpose is to support the use of text written in all of the world's major writing systems.

Unicode assigns a unique number to every character, regardless of the platform, program, or language. Before Unicode, various character encodings existed, each with limitations. These early encoding methods could not cover all languages and often conflicted with one another. Unicode changed this by providing a consistent way to represent characters across different languages.

Unicode uses 16 bits to represent characters.

Here are examples characters in the Unicode Character Set:

こんにちは (Japanese)
厦灣 (Chinese)
한국 (Korean)
ćōčīūīū (Hawaiian)
العربية (Arabic)
Hello World (English)
سلام الليكم (Urdu)
বাইলার বালার (Bengali)
हमात वारीन्र (Hindi)
Γεια σου (Greek)

How to work out file size of text...

number of characters x bits used to represent each character = file size in bits

Let's work out the file size of the phrase: Social Distancing

17 characters x 7 bit ASCII = 119 bits
17 characters x 8 bit ASCII = 136 bits
17 characters x 16 bit Unicode = 272 bits

mrahmedcomputing

Lesson 4. Information Encoding Systems

Lesson Objective

Lesson Notes

Bit Patterns

Here some practical examples of file sizes:

Storing Values in Binary

Storage Prefix

Units of Measurement

Unit Conversion Table

Lesson Notes

7 bit ASCII Table

8 bit ASCII

Unicode

How to work out file size of text...